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UNCERTAINTY  AND  THE  CONDITIONING  OF 
BELIEFS' 


Henry  E.  Kyburg,  Jr. 


A  bstr  act—  Uncertainty  is  part  of  the  human  condition  Whether  we  will  or  no,  we  must 
act,  we  must  make  decisions,  in  the  face  of  uncertainty.  Some  authors  have  proposed 
that  uncertainty  be  regarded  as  essentially  a  subjective  matter.  Our  first  goal  is  to 
draw  the  teeth  of  tne  classical  subjectivistic  argument  that  one  must  be  prepared  to 
meet  all  bets  on  the  basis  of  one's  "degrees  of  belief."  The  Dutch  book  theorem,  which 
purports  to  have  this  as  a  consequence,  is  stated  and  criticized.  Other  criticisms  of 
logical  and  subjective  probability  are  considered.  This  leads  to  the  consideration  of 
alternative  conceptions  of  how  to  represent  epistemic  uncertainty.  A  variety  of 
alternatives  have  been  offered,  including,  recently,  Glenn  Shafer's  theory  of  belief 
functions.  An  exposition  of  Shafer's  theory  is  offered.  We  then  relate  Shafer's  theory 
of  belief  functions  to  a  theory  that  represents  (and  updates)  uncertainty  in  terms  of 
convex  sets  of  classical  probability  functions.  Finally,  we  discuss  the  question  of  the 
decision  principles  that  can  be  employed  in  the  case  of  both  the  convex  set 
representation  and  the  belief  function  representation  of  uncertainty. 


I.  BACKGROUND 


It  is  a  fact  of  life,  whether  we  applaud  it  or  deplore  it,  that  we 
must  decide  and  act  in  the  face  of  uncertainty  and  on  the  basis  of 
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incomplete  information.  It  is  argued  by  some  philosophers  that  if  we 
had  complete  information,  we  would  not  have  to  act  in  the  face  of 
uncertainty;  but  it  has  also  been  argued  by  others  (and  by  philosophers 
of  quantum  mechanics  in  particular)  that  even  if  we  had  full 
information,  we  would  not  be  able  to  eliminate  uncertainty. 

It  is  true  that  if  complete  knowledge  included  knowledge  of  the 
future  we  would  not  have  to  face  uncertainty  if  we  had  complete 
knowledge.  We  mortals  have  been  seeking  that  kind  of  knowledge  for 
centuries;  in  the  stars,  in  chicken  entrails,  in  science.  We  have  not 
found  it.  When,  long  ago,  we  believed  that  the  gods  knew  the  future  and 
told  the  truth,  we  also  understood  that  the  oracles  spoke  in  riddles. 
Thus,  though  we  had  been  told  what  the  future  would  bring,  our 
interpretation  of  what  we  had  been  told  introduced  a  new  level  of 
uncertainty. 

Although  probability  theory  has  developed  only  in  very  recent 
historical  times,  people  have  had  some  understanding  of  the  practical 
aspects  of  uncertainty  for  as  long  as  gambling  has  been  a  pastime.  It  is 
an  important  and  interesting  modern  question  to  ask  to  what  extent  the 
gambling  model  of  action  in  the  face  of  uncertainty  has  general  validity. 

A  number  of  authors,  philosophers,  statisticians,  and  probability 
theorists,  have  drawn  a  distinction  between  the  kind  of  uncertainty  that 
characterizes  our  general  knowledge  of  the  world,  and  the  kind  of 
uncertainty  that  we  discover  in  gambling.  This  distinction  appeals  to 
intuition.  If  the  dice  are  fair,  the  chances  of  two  ones  on  a  roll  of  two 
dice  is  1/36.  But  what  is  the  chance  that  the  dice  are  fair?  That  seems 
quite  a  different  question. 

Taking  the  distinction  seriously  has  led  to  two  dominating  views 
concerning  probability  and  uncertainty.  One  identifies  probability  with 
long-run  relative  frequency.  This  view  was  given  explicit  articulation  by 
John  Venn  (1866)— though  Aristotle,  who  said  that  what  was  probable 
was  that  which  happened  for  the  most  part,  might  also  be  taken  as  a 
frequency  theorist.  Its  best  known  advocate  was  the  positivist,  Richard 
von  Mises  (1928).  This  view  appears  to  account  for  the  assessment  of 
the  chances  of  getting  the  sum  two  in  a  roll  of  two  dice:  that  result 
happens  about  1/36  of  the  time  in  the  long  run;  therefore  the  probability 
should  be  taken  to  be  1/36. 

On  the  other  hand,  dice  can  be  more  or  less  f  air;  and  there  is  no 
well  established  and  agreed  upon  relative  frequency  with  which  dice  are 
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unfair.  This  seems  to  be  quite  a  different  problem.  And  so  a  different 
conception  of  probability  has  been  devised  to  deal  with  it. 

More  accurately,  two  different  conceptions.  For  an  early  hope, 
articulated  by  John  Maynard  Keynes  (1921),  was  that  it  should  be 
possible  to  define  a  logical  conception  of  probability  that  would  measure 
the  degree  of  uncertainty  of  any  hypothesis  on  any  evidence.  (Others 
had  already  conceived  of  a  notion  of  probability  that  would  be 
epistemic--i.e.,  that  would  determine  the  rational  degree  of  belief  of  an 
agent  possessed  of  given  evidence.)  Thus,  given  what  we  know  about 
dice,  social  customs,  physics,  the  interests  of  our  f  riends,  the  state  of  the 
economy,  and  so  on,  there  would  be  one  logically  fixed  probability  for 
the  hypothesis  that  a  given  die  is  loaded  to  a  given  degree.  Keynes 
supposed  that  there  was  such  a  probability,  fixed  and  determined  by 
background  knowledge  and  evidence,  but  he  did  not  assume  that  it  was 
a  real  number  in  the  interval  (0,  1],  In  particular,  he  thought  there  was 
good  reason  to  suppose  that  sometimes  probabilities  were  not 
comparable:  taking  what  I  know  about  the  world  as  evidence,  I  cannot 
say  whether  rain  tomorrow  is  more  probable  than,  less  probable  than, 
or  as  probable  as,  the  occurrence  of  heads  on  the  next  toss  of  this  coin. 
This  is  not  through  any  failure  of  logical  insight,  or  weakness  of 
intellect.  It  is  simply  that  the  abstract  objects  we  call  "probabilities"  are 
not  simply  ordered,  but  only  partially  ordered.  They  form  a  lattice, 
whose  supremum  and  infimum  are  1  and  0,  but  in  which  there  are  many 
non-comparable  pairs. 

The  idea  of  a  lattice  of  probability  values  was  pursued  briefly  by 
B.  O.  Koopman  ( 1940),  and  then  disappeared  until  the  late  1950’s,  to  be 
revived  under  a  different  name. 

Meanwhile,  a  number  of  writers  continued  to  pursue  the  idea  of 
probability  as  a  logical  relation.  Foremost  among  these  was  Rudolf 
Carnap  (1950).  The  idea  here  is  this:  given  a  formal  language,  there  is 
an  intuitively  correct  assignment  of  real-valued  measures  m  to  its 
sentences  such  that  if  h  is  an  hypothesis,  and  e  is  our  total  store  of 
evidence,  the  probability-legislativef  or  rational  belief --of  h  conditional 
on  e  is  m(h  &  e)/m(e).  This  is  Keynes’  vision,  formalized  and 
simplified  by  the  assumption  that  probabilities  are  real  numbers  in  the 
unit  internal. 

Others  (e.g.  Harold  Jeffreys,  1939;  Jaakko  Hintikka,  1966;  Ilkka 
Niiniluoto,  1976)  have  pursued  this  vision.  It  has  turned  out  to  be  a 
complicated  job  to  assign  measures  to  all  the  sentences  of  a  complicated 
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and  general  language.  The  only  feasible  way  of  doing  it  seems  to  be  to 
parametrize  the  language  (number  of  one-place  predicates,  impact  of 
canonical  evidence  on  a  canonical  assertion,  etc.).  But  then,  if  the 
assignment  is  to  be  ’rational",  and  defensible  as  rational,  we  must  ask 
the  question:  Why  should  these  parameters  have  the  values  we  have 
given  them?  The  answers  have  been  hard  to  find. 

Shortly  after  Keynes  had  proposed  his  logical  view  of 
probability,  Frank  Ramsey,  a  colleague  of  Keynes’  at  Cambridge, 
criticized  it  from  the  point  of  view  of  what  has  come  to  be  called 
personalistic  Bayesianism  (Ramsey,  [1931 J  1950).  Ramsey  argued  that 
there  was  no  point  in  saying  that  something  was  "legislative  for  rational 
belief"  unless  you  could  measure  belief.  Ramsey  devised  a  pragmatic 
(or  operational)  method  for  measuring  belief  according  to  which  there 
was  a  clear  argument  (we  leave  aside  here  the  question  of  its  validity) 
that  beliefs  should  be  real  valued  and  should  conform  to  the  probability 
calculus.  He  could  find  no  argument  that  they  should  satisfy  any  other 
constraints.  Thus  he  rejected  the  logical  conception  of  probability  in 
favor  of  a  subjectivistic  conception. 

Bruno  de  Finetti  (1937),  and  L.  J.  Savage  (1954),  both 
statisticians,  also  endorsed  the  view  that  such  probabilities  as  the 
probability  that  the  die  is  biased,  could  only  be  subjective.  Of  course 
this  is  not  to  say  that  such  probabilities  do  not  depend  on  evidence;  it  is 
only  to  say  that  it  is  some  individual  who  evaluates  the  evidence,  and 
that  there  is  no  reason  that  you  and  I  should  both  evaluate  the  same 
evidence  in  the  same  way.  From  their  views  a  very  lively  tradition  has 
evolved.  It  is  called  "Bayesianism",  though  it  is  not  Bayes  theorem  that 
is  at  issue. 

Bayes  theorem  is  a  theorem  of  the  conventional  probability 
calculus.  It  says  that  the  probability  of  a  hypothesis,  relative  to  some 
evidence,  is  the  prior  probability  of  that  hypothesis,  multiplied  by  the 
probability  of  the  evidence  on  the  supposition  that  the  hypothesis  is 
true,  divided  by  the  prior  probability  of  the  evidence.  If  we  can  suppose 
that  we  have  a  number  of  exhaustive  and  exclusive  alternatives  that  can 
be  taken  as  hypotheses,  the  prior  probability  of  the  evidence  can  be 
taken  as  a  sum  of  terms  consisting  of  the  prior  probability  of  an 
alternative  hypothesis,  multiplied  by  the  conditional  probability  of  the 
evidence,  given  that  hypothesis.  Since  it  is  generally  (but  perhaps  in¬ 
advisedly)  supposed  that  the  probability  of  a  piece  of  evidence,  given  a 
statistical  hypothesis  concerning  evidence  of  that  sort,  is  unproblematic. 
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the  serious  question  for  the  Bayesian  point  of  view  is  the  source  and 
status  of  the  prior  probabilities  of  the  hypotheses. 

Ramsey’s  solution  is  that  rationality  imposes  no  constraint.  A 
man  may  have  whatever  degrees  of  belief  he  will,  provided  only  that 
they  satisfy  the  constraints  of  the  probability  calculus.  Some  writers 
suppose  that  prior  probabilities  are  determined  by  some  general 
principle  (e.g.,  the  maximum  entropy,  or  least  information,  principle 
--E.  T.  Jaynes,  1968),  but  the  application  of  the  general  principle 
depends  on  the  "formulation  of  the  problem,”  which  is  again  a  relatively 
subjective  matter.  Logical  theorists,  as  already  noted,  require  the 
specification  of  parameters  in  order  to  determine  the  prior  probabilities 
of  hypotheses. 

By  Ramsey’s  Dutch  book  argument,  these  are  all  the  alternatives 
there  are.  Ramsey’s  argument  is  that  you  should  have  degrees  of  belief 
such  that  you  could  accept  all  bets  offered  at  odds  corresponding  to  your 
degrees  of  belief  without  having  a  Dutch  book--a  set  of  bets  that  entails 
that  you  lose  whatever  happens--made  against  you. 

It  follows  that  probabilities  are  real-valued.  And  it  follows  that 
they  must  be  updated  by  Bayes  theorem:  i.e.  that  there  must  be  prior 
probabilities  for  every  hypothesis.  But  these  probabilities  must  then  be 
subjective  (Ramsey’s  view)  or  they  must  be  obtained  systematically, 
according  to  general  principles  (the  logical  view,  the  maximum  entropy 
view).  But  in  the  latter  cases  there  are  important  parameters  that  are 
just  as  subjective  as  Ramsey’s  degrees  of  belief. 

To  avoid  this  conclusion,  and  the  arbitrariness  it  embodies,  we 
must  draw  the  teeth  of  Ramsey’s  argument  (or  find  compelling  rational 
principles  that  do  not  require  subjective  judgment). 

Although  the  issues  involved  can  be  complex  (see  Fahiem 
Bacchus,  Kyburg,  and  Mariam  Thalos,  1989),  the  basic  idea  is  simple. 
To  be  sure,  it  is  irrational  to  accept  a  set  of  bets  according  to  which  you 
lose  something  you  value  no  matter  what  happens.  But  this  fact  about 
rationality  says  nothing  about  degrees  of  belief.  The  crucial  connection 
to  degrees  of  belief  is  the  part  of  the  argument  that  identifies  one’s 
degree  of  belief  in  a  statement  S  with  the  least  odds  at  which  one  would 
bet  on  5.  But  it  is  not  at  all  obvious  that  one  has  degrees  of  belief,  or 
that  they  are  associated  with  the  odds  at  which  one  is  willing  to  bet  in 
the  way  that  Ramsey  suggests. 

Specifically,  while  it  seems  reasonable  to  say  that  the  least  odds 
at  which  I  am  willing  to  bet  on  S  represent  a  kind  of  lower  bound  of  my 
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belief  in  S,  and  similarly  for  the  greatest  odds  at  which  l  am  willing  to 
take  a  bet  on  8,  it  is  not  at  all  obvious  that  these  two  sets  of  odds  should 
be  complementary.  If  I  am  unsure  about  S,  1  may  well  offer  odds  of  1 
to  2  on  5,  and  odds  of  1  to  2  against  S ,  without  being  willing  to  offer  any 
intermediate  odds  on  either. 

Another  approach  to  determining  the  basic  properties  of 
probability  is  the  analytic  approach  exemplified  by  Richard  T.  Cox 
(1961).  It  turns  out  that  the  most  innocuous  and  harmless-sounding 
conditions  imposed  on  uncertainty  can  be  shown  to  lead  directly  to  the 
conventional  probability  calculus.  Among  these  conditions  is,  of  course, 
something  akin  to  simple  order  among  probabilities. 

One  should  remember,  at  this  point,  the  basic  distinction  that 
has  given  rise  to  these  problems:  the  distinction  between  probabilities 
that  can  be  construed  as  frequencies,  and  probabilities  that  cannot  be  so 
construed.  We  shall  see  later  (in  section  IV)  that  this  is  not  as  simple 
a  distinction  as  it  appears  to  be. 


//.  VARIANTS  ON  PROBABILITY 


There  are  a  number  of  objections  to  the  classical  probability 
calculus  as  a  representation  of  uncertainty.  Among  them  are  these: 

1.  Strictly  speaking,  frequencies  only  apply  to  classes  or 
predicates.  One  can  speak  of  the  frequency  of  heads  on 
tosses  of  this  coin,  but  not  usefully  of  the  frequency  of 
heads  on  the  next  toss  of  this  coin. 

2.  Many  of  the  events  whose  probability  we  wish  to 
speak  of  (the  probability  that  an  individual  exhibiting  a 
unique  background  and  cluster  of  symptoms  has  a 
certain  disease)  are  not  related  in  any  obvious  way  to 
statistical  knowledge. 

3.  Subjective  and  logical  interpretations  of  probability 
give  us  numbers,  but  they  are  arbitrary.  The  numbers 
provided  by  a  logical  view  reflect  arbitrary  general 
assignments  to  the  sentences  of  an  artificial  language. 
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The  numbers  provided  by  a  subjectivistic  view  may  (for 
all  the  theory  can  say)  reflect  mere  whimsy. 

4.  None  of  the  theories  provides  a  representation  that 
can  indicate  directly  that  a  probability  is  unknown  or 
poorly  known:  that  is,  that  can  indicate  the  difference 
between  the  probability  of  heads  on  the  next  toss  of  a 
well  tested  coin,  and  the  probability  of  heads  on  a 
totally  unknown  coin:  both  may  be  represented  by  the 
number  0.500.  (The  difference  is  indicated  indirectly  by 
the  conditional  probability  of  heads  given  heads). 

5.  Bayesian  and  logical  views  often  require  the 
assignment  of  probabilities  to  a  great  many  entities. 

Thus  in  computing  the  conditional  probability  of  H 
given  E,  we  may  require  the  probability  of  E  on  every 
alternative  hypothesis  to  H. 

A  number  of  philosophers,  including  Karl  Popper  (1959), 
Nicholas  Rescher  (1958),  Carl  Hempel  and  Paul  Oppenheim  (1945), 
have  offered  measures  of  evidential  or  factual  support.  These  measures 
are  not  probabilities,  though  they  are  relatively  simple  functions  of 
probabilities.  (For  a  table  exhibiting  their  relations,  and  the  ways  in 
which  they  are  related  to  conventional  probability  measures,  see 
Kyburg,  1970.) 

These  measures  are  designed  explicitly  to  guide  our  beliefs  with 
respect  to  general  hypotheses:  e.g.,  the  hypothesis  that  the  die  is  biased 
in  a  certain  way,  the  hypothesis  that  all  A’s  are  B's,  the  Newtonian 
hypothesis  (or  theory)  governing  celestial  motions,  the  hypothesis  that 
less  than  30%  of  the  A’s  are  B’s.  Of  course  these  are  exactly  the  sorts 
of  hypotheses  whose  probabilities  one  needs  to  feed  into  Bayes  theorem. 
What  happens  when  we  use  these  numbers  in  a  decision  theoretic 
context? 

As  soon  as  we  try  to  use  such  measures  in  a  decision  theoretic 
context,  Ramsey’s  (or  Cox’s)  arguments  apply  full  force.  Here  we  have 
no  question  of  merely  representing  the  open  and  vague  and  ambiguous 
notion  of  belief;  here  we  have  a  straightforward  matter  of  decision 
involving  (presumably)  well  specified  utilities.  It  may  well  be  that  my 
psychological  state  concerning  whether  drug  A  will  relieve  the  symptoms 
of  patient  P  is  best  represented  by  a  vector.  But  that  is  another  matter. 
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The  most  intuitive  way  of  associating  degrees  of  belief  with 
numbers,  employed  by  Savage  (1954),  is  this:  What  is  the  most  you 
would  pay  for  a  ticket  that  would  yield  a  dollar  if  S  is  true?  That  is  your 
probability  for  S.  On  all  of  these  variant  views,  the  support  of  a 
hypothesis  is  supposed  to  be  real-valued,  and  normalized  to  the  [0,1) 
interval.  If  the  numbers  can  be  used  to  weight  utilities,  in  a  decision 
theoretic  context,  then  it  follows  from  Ramsey’s  arguments  (among 
others)  that  they  must  satisfy  the  axioms  of  the  probability  calculus. 
That  is,  the  measures  purporting  to  be  variants  on  probability  cannot  be 
viable  if  they  lead  to  a  book  being  made  against  one.  Or  they  cannot  be 
taken  as  guiding  our  decisions  in  the  face  of  uncertainty. 

A  similar  story  may  be  told  about  Artificial  Intelligence.  Expert 
systems,  it  is  clear,  must  be  capable  of  handling  uncertainly.  Various 
systems  have  employed  various  representations  of  uncertainty.  For 
example,  MYCIN  (E.H.  Shortliffe,  1976)  is  an  expert  system  designed 
to  provide  assistance  in  medical  diagnosis.  The  certainty  factors  of 
MYCIN,  for  example,  range  from  -1.0  to  1.0.  where  -1.0  applied  to  S 
represents  full  confidence  that  S  isfalse,  and  l.Oapplied  to  S  meansfuil 
confidence  that  5  is  true.  In  the  process  of  inference,  certainty  factors 
are  combined  according  to  special  rules. 

Certainty  factors  are  not  probabilities.  Not  only  is  the  range 
wrong,  but  the  rules  of  combination  are  inconsistent  with  the  (Bayesian) 
rules  for  the  combination  of  probabilities.  If  they  were  to  be  used  as 
weighting  factors  in  making  decisions,  in  the  same  way  that  probabilities 
are  used,  Ramsey’s  arguments  could  be  used  to  show  that  the  decisions 
would  not  be  rational:  in  a  sense,  the  physician  could  have  a  book  made 
against  him.  (There  is  no  suggestion  that  certainty  factors  should  be 
used  this  way;  there  is  no  suggestion  of  computing  expectations  based 
on  certainty  factors  and  using  these  expectations  for  arriving  at 
decisions.  But  the  Dutch  book  argument  provides  a  reason  for 
eschewing  these  suggestions.) 

Another  approach  to  the  treatment  of  uncertainty  that  has 
received  much  attention  in  artificial  intelligence  is  Shafer’s  (1976) 
theory  of  belief  functions.  This  is  a  clear  mathematical  theory,  based  on 
earlier  work  of  Arthur  Dempster  (1967;  1968).  It  is  designed  to 
overcome  some  of  the  discomf  orts  that  people  have  felt  concerning  both 
the  subjectivistic  Bayesian  theories  and  their  logical  variants,  as  well  as 
frequency  theories. 
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The  basic  building  block  of  the  theory  of  belief  functions  is  the 
/ rame  of  discernment  Q.  A  frame  of  discernment  may  be  thought  of 
as  a  set  of  possible  worlds,  to  use  philospher's  jargon,  but  they  need  be 
construed  in  no  more  detail  than  concerns  us  in  a  given  context.  If  I  am 
concerned  with  the  outcome  of  a  coin  toss,  there  are  only  two  possible 
worlds  that  concern  me:  for  example,  one  in  which  the  coin  lands  heads, 
and  one  in  which  it  lands  tails. 

My  beliefs  are  represented  by  an  assignment  of  mass  to  sets  of 
possible  worlds,  including  the  possibility  of  assigning  mass  to  unit  sets 
or  singletons  of  possible  worlds,  and  the  possibility  of  assigning  mass  to 
the  set  of  all  possible  worlds.  Masses  are  non-negative  real  numbers 
between  ()  and  1.  The  total  mass  assigned  is  1.0. 

A  belief  function,  or  support  function,  is  a  function  whose 
domain  is  sets  of  possible  worlds  (subsets  of  Q),  and  whose  values  lie  in 
|0,1|.  For  X  c  Q,  Bel(X)  =  Im(  A).  where  m(  A)  is  the  mass  assigned  to 
A  C  Q,  and  the  summation  extends  over  all  subsets  A  of  X,  including  X 
itself. 

Bel(X)  represents  the  amount  of  belief  I  have  in  the  possibility 
X.  It  is  one  of  the  attractive  features  of  this  system,  as  opposed  to 
classical  probability  systems,  that  I  can  have  very  little  belief  in  X  and 
at  the  same  time  very  little  belief  in  its  denial,  which  we  denote  by 
X':  that  is.  instead  of  P  (  ~  X)  =  1  -  P(X).  we  can  have  both  Bel(X) 
-  f  and  Bel(  ~  X)  =  t.  To  express  complete  ignorance  about 
everything,  we  can  assign  a  mass  of  1.0  to  Q,  and  a  mass  of  0  to  every 
proper  subset  of  Q. 

It  is  easy  to  see  how  attractive  this  can  be.  Somehow,  to  know 
the  probability  of  something  is  to  know  something ;  a  probability  of  0 
represents,  not  ignorance,  but  certainty  just  as  much  as  a  probability  of 
1.  But  a  probability  of  a  half  doesn't  seem  to  represent  ignorance, 
either.  In  the  new  system,  belief  in  S  equal  to  0  may  represent 
ignorance;  it  does  so  if  belief  in  the  denial  of  S  is  also  0. 

Let  us  now  consider  u pda ting-lhe  way  belief  functions  and  mass 
functions  change  with  the  accumulation  of  evidence.  "Evidence"  is 
construed  as  a  frame  of  discernment  with  a  belief  function  defined  on 
it.  This  represents  what  has  happened  to  us--what  we  are  taking  account 
of.  If  Q  contains  six  subsets  corresponding  to  the  outcome  of  a  toss  of 
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a  slightly  suspicious  die,  it  might  have  masses  of  0.1  on  each  ol  those 
subsets  and  a  mass  of  0.4  (representing  ignorance )  on  Si  itself.  Now  let 
us  suppose  we  are  told  by  a  person  of  doubtful  reliability  that  the  toss 
resulted  in  an  odd  number  of  spots.  This  might  be  represented  as  the 
same  frame  of  discernment  wi.'..  a  mass  of  0.7  on  the  set  corresponding 
to  odd  tosses,  and  0.3  on  C2. 

Our  beliefs  should  now  be  represented  as  the  result  of 
combining  these  two  belief  functions.  (Note  that  we  have  not  required 
that  the  "evidence"  be  known  with  certainty.)  The  procedure  is  to 
consider  all  the  subsets  of  Q  that  have  mass  according  to  either  belief 
function;  if  S  bears  positive  mass  m , ( 3)  according  to  the  first  belief 
function,  and  T  bears  positive  mass  m.,(  T )  according  to  the  second  belief 
function,  then  we  assign  a  mass  of  m,(S)  x  m;(T)  to  the  intersection  of 
,V  and  7\  provided  that  intersection  is  not  empty.  If  it  is  emptv--i.e.  if 
it  represents  an  impossible  state  of  affairs,  such  as  the  loss  landing  two 
and  also  being  odd--then  we  assign  it  0.0.  To  account  for  this  lost  mass 
and  to  get  back  to  a  canonical  belief  function,  we  normalize  by  dividing 
each  number  by  1-k,  where  k  is  the  sum  of  the  products  of  the  mass  of 
subsets  that  are  inconsistent  with  each  other. 

Thus  we  have,  for  our  example:  the  mass  assigned  to  the 
intersection  of  'one'  and  odd’  is  0. 1x0.7  =  0.07;  the  mass  assigned  to  the 
intersection  of  'one'  and  Q  is  0. 1x0.3  =  0.03;  etc.,  all  normalized  to  take 
account  of  the  impossibility  of  certain  intersections.  The  following  table 
illustrates  the  procedure. 


odd 

Q 

1 

0. 1x0.7 

0. 1x0.3 

•> 

0.0 

0. 1x0.3 

3 

0. 1x0.7 

0. 1x0.3 

4 

0.0 

0. 1x0.3 

5 

0. 1x0.7 

0. 1x0.3 

6 

0.0 

0. 1x0.3 

Q 

0. 4x0.7 

0.4x0. 3 

The  normalizing  number  is  1  -  3x0.07  =  1  -  0.21  =  0.7*7.  Thus  we  find 
that  the  belief  we  should  attribute  to  three’  is  (0. 1x0.7  +  0.1x0.3)/0.79 
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0.127;  the  belief  we  should  attribute  to  odd  is  0.734;:  the  belief  we 
should  attribute  to  Q  (ignorance)  is  0.152;  etc. 

There  is  a  special  case  that  corresponds  to  Bayesian 
conditionalization.  If  our  evidential  belief  function  assigns  mass  1.0  to 
a  single  subset  of  Q  (and  perforce  0  to  every  other  subset  of  Q),  then  we 
may  compute  the  updated  probability  of  any  subset  of  Q  by  means  of 
what  Shafer  ( 197t>,  p.  o7 )  calls  Dempster's  rule  of  conditioning.''  In  this 
X  is  arbitrary,  and  B  is  the  set  corresponding  to  the  evidence  (we 
assume  that  the  belief  function  for  not  B  is  positive.  Bel(~  B)  >  0): 


Hi  Be!(X  B)  =  1  Bel  ( X  u  -  B)  -  Bel(  ~  B)|/[  1  -  Bel  (  ~  B)|. 


A  simple  support  f  unction  is  a  belief  function  that  results 
from  the  assignment  of  mass  to  Q  and  to  a  single  subset  of  Q.  .4 
separable  support  f  unction  is  a  belief  function  that  results  from  the 
combination  of  a  finite  number  of  simple  support  functions.  There  are 
other  support  functions,  and  indeed  there  are  belief  functions  that  arc 
not  support  functions,  but  the  separable  support  functions  represent 
quite  a  broad  class.  It  is  therefore  of  interest  to  note  that  there  is  a 
procedure  for  expanding  Q  so  that  the  result  of  updating  by  a  simple 
support  function  can  be  represented  as  an  instance  of  Dempster's  rule 
of  conditioning  (Kyburg,  1987).  It  follows  that  updating  by  a  separable 
support  function  can  be  represented  by  a  sequence  of  steps  of  Dempster 
conditioning. 

W  e  have  a  general  and  attractive  procedure  for  representing  and 
updating  uncertainty  here.  It  seems  quite  different  from  probability. 
But  one  of  the  differences  is  not  so  nice:  there  is  no  obvious  decision 
procedure  based  on  belief  functions.  In  the  case  of  any  standard 
subjective  or  logical  probabilistic  approach,  we  can  apply  the  principle 
of  maximizing  expected  utility  to  decision  theory.  Here  we  cannot. 


The  measures  assigned  to  1. 3.  and  5  are  each  0. 1x0.7  +  0. 1x0  3.  or  a  total  of  0.30.  plus 
the  measure  assigned  to  the  general  class. odd,  by  the  new  information,  multiplied  by  the 
non-specific  assignment  provided  by  the  old  information.  0. 7x0.4.  This  sum.  0  58.  is 
normalized  by  dividing  by  0.79.  which  yields  0  734 
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IV.  BELIEF  FUNCTIONS  AND  PROBABILITIES 


Dempster  ( 1967;  1968)  originally  referred  to  upper"  and  "lower" 
probabilities.  The  idea,  but  not  the  terminology  is  preserved  in  Shafer: 
the  belief  function  gives  the  lower  probability;  there  is  a  dual  notion, 
plausibility,  that  corresponds  to  an  upper  probability.  (The  plausibility 
of  X,  P1(X),  is  defined  to  be  1  -  Bei(~  X).) 

First,  note  that  the  space  Q  of  possibilities  is  just  another  way 
of  representing  propositions  or  statements  A  subset  of  Q  corresponds 
to  a  statement.  A  probability  function  defined  over  Q  consists  in  the 
assignment  of  a  number  to  each  complete  description  of  a  state  of 
affairs--or  each  atomic  possible  world.  A  set  of  possible  worlds, 
corresponding  to  a  disjunction  of  the  atomic  world  descriptions,  will 
then  receive  as  its  measure  the  sum  of  the  numbers  assigned  to  its 
atoms.  The  translation  between  statements  and  subsets  of  Q  is 
straightforward. 

Shafer’s  system  does  not  require  (but  it  allows)  the  assignment 
of  masses  to  the  singletons  (corresponding  to  the  atomic  worlds).  We 
can  capture  this  aspect  of  the  system  by  considering,  not  a  single 
assignment  to  the  atomic  worlds,  but  a  set  of  assignments.  For  example, 
consider  a  simple  frame  of  discernment  containing  two  states  of  affairs: 
heads  and  tails.  The  subsets  consist  of  0,  which  has  mass  0,  H  = 
{heads},  T  =  {tails},  and  Q  =  {heads,  tails}.  Let  us,  to  reflect  our 
uncertainty  about  the  coin,  assign  mass  0.4  to  H  and  to  T,  and  mass  0.2 
to  Q.  We  can  accomplish  the  same  thing  with  a  set  of  probability 
functions:  we  can  consider  the  set  of  all  those  classical  probability 
functions  whose  domain  is  {heads,  tails},  and  whose  value  for  heads  lies 
between  0.4  and  0.6.  For  every  function  P  in  this  set,  P(Tails)  =  1  - 
P(Heads).  Belief  and  plausibility  are  now  most  naturally  thought  of  as 
lower  and  upper  probabilities,  respectively. 

This  holds  quite  generally.  Given  any  belief  function  defined  on 
a  frame  of  discernment,  there  will  exist  a  set  of  classical  probability 
functions,  defined  on  the  same  set  of  possible  worlds,  with  the  property 
that  for  any  subset  X  of  the  frame  of  discernment,  the  belief  assigned  to 
X,  Bel(X),  is  the  minimum  of  the  values  P(X)  for  probability  functions 
P  in  that  set,  and  the  plausibility  assigned  to  X,  P1(X)  =  1  -  Bel(~  X), 
is  the  maximum.  Furthermore,  the  set  of  probability  functions  with  this 
property  is  convex:  If  P  and  Q  belong  to  the  set  of  probability  functions 
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in  question,  so  does  the  function  PQ,  where  PQ(X)  =  a(P(X))  +  (1-a) 
(Q(X)),0  <  a  <  1. 

Surprisingly,  the  converse  relation  does  not  hold.  There  are 
sets  of  probability  functions  to  which  there  corresponds  no  belief 
function.  Furthermore,  these  examples  need  not  be  bizarre. 

Consider  a  compound  experiment3  consisting  of  performing  a 
mixture,  in  unknown  ratio  p,  of  two  experiments:  (1)  tossing  a  fair  coin 
twice,  or  (2)  drawing  a  coin  from  a  bag  containing  60%  two-headed  and 
40%  two-tailed  coins,  and  tossing  it  twice.  The  outcomes  of  the 
compound  experiment  that  interest  us  are  A,  the  event  that  the  first  toss 
lands  heads,  and  B,  the  event  that  the  second  toss  lands  tails.  CP  is  to 
be  the  convex  set  of  possible  distributions  of  outcomes  on  the  compound 
experiment.  CP  =  {<  Y*p  +  0.6(1  -p),  Y*p,  Y*p,  V*p  +  0.4(1-/?)  >  p  e 
[0,1]}.  This  is  a  set  of  quadruples.  The  first  parameter  is  the  frequency 
of  HH,  the  second  of  HT,  the  third  of  TH,  and  the  fourth  of  TT,  on  an 
arbitrarily  large  number  of  repetitions  of  the  compound  experiment. 
We  are  representing  our  knowledge  of  the  long-run  outcomes  of  the 
experiment  by  a  convex  set  of  probability  distributions.  We  call  this  the 
convex  set  representation. 

Let  P„(X)  be  the  least  value  of  P(X)  for  P  e  CP.  Then  P„(A  u 
B)  <  P„(A)  +  P„(B)  -  P„(A  D  B).  We  would  like  to  identify  P.(X)  with 
Bel(X).  But  one  of  Shafer’s  (1976,  pp.  38-39)  theorems  requires  that  for 
all  belief  functions,  Bel(A  u  B)  >  Bel(A)  +  Bel(B)  -  Bel(A  H  B).  This 
shows  that  P.  cannot  be  a  belief  function.  We  cannot  represent  this 
uncertain  situation  by  belief  functions,  but  the  convex  set  representation 
is  quite  straightforward  and  intuitive. 

Both  representations  are  of  interest,  however.  The  belief 
function  representation  is  an  easy  one  to  manipulate;  the  convex  set 
representation  is  difficult  to  deal  with  computationally.  The  convex  set 
representation  is  intuitively  clear;  the  belief  function  repiesentation 
■  seems  artificial.  Furthermore,  the  two  representations  are  mutually 
<  enlightening. 

[  As  an  example,  let  us  consider  updating  in  the  light  of  new 

evidence.  In  the  convex  set  representation,  we  can  represent  classical 
^  Bayesian  conditionalization.  Given  a  single  probability  function  P,  the 
conditional  probability  of  a  hypothesis  H  on  evidence  E,  when  P(E)  > 


This  example  was  suggested  by  Teddy  Seidenfeld  in  conversation. 
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0,  is  P(H  |  E)  =  P(H&E)/P(E).  If  E  represents  our  total  increment  of 
evidence,  the  principle  of  conf  irmational  conditionalization  (Isaac 
Levi,  1980)  directs  us  to  adopt  as  our  credence  function,  P(H)  = 
P(H&E)/P(E). 

Given  a  set  of  probability  distributions  CP,  we  can  accomplish 
the  same  end.  Let  CP  be  a  convex  set  of  classical  probability  functions. 
Let  our  total  new  evidence  be  E.  Then  our  new  belief  state  should  be 
represented  by  CP' ,  where  CP'  is  the  set  of  probability  functions  of  the 
form  P(H&E)/P(E)  =  P(H  |  E),  for  P  in  the  set  CP,  and  P(E)  >  0.  It 
turns  out  that  when  CP  is  convex,  and  there  is  at  least  one  function  P  in 
CP  such  that  P(E)  >  0,  CP'  is  convex,  too. 

Now  a  belief  function  can  be  represented  by  a  convex  set  of 
probability  functions  (but  not  vice  versa),  and,  when  E  is  a  piece  of 
evidence  we  learn  for  certain,  we  can  apply  both  Dempster  conditioning 
and  confirmational  conditionalization.  It  turns  out  that  Dempster 
conditioning  imposes  tighter  constraints  on  our  degrees  of  belief  than 
does  confirmational  conditionalization.  Writing  Bel(X  |  Y)  for  the 
updated  belief  function  and  P1(X  )  Y)  for  the  updated  plausibility 
function,  we  have  the  following  relation,  where  the  infimum  (inf)  and 
supremum  (sup)  are  taken  over  the  set  of  functions  CP'  (Kyburg,  1987): 


(2)  inf  P(H  |  E)  <  Bel(H  |  E)  <  P1(H  |  E)  <  sup  P(H  |  E), 


In  this,  equality  holds  only  in  rather  special  cases,  when  certain 
distributions  are  ruled  out  as  impossible  by  all  the  P’s  in  CP. 

One  response  to  this  fact  would  be  to  be  pleased  that  the  belief 
function  form  of  updating  leads  to  "stronger"  results  than  generalized 
Bayes.  I  believe  that  this  response  would  be  mistaken.  We  have  given 
no  specific  interpretation  to  the  members  of  CP.  In  particular,  they  may 
be  purely  objective  chances  or  frequencies,  or  they  may  (as  I  would 
usually  construe  them)  be  epistemic  probabilities  directly  based  on 
knowledge  of  frequencies  or  chances.  In  either  case,  inf  P(X)  can 
represent  the  value  of  a  frequency  or  a  chance.  In  adopting  Bel(H  |  E) 
as  your  odds-determining  measure,  you  may  be  ruling  out  this  possibility 
groundlessly.  This  corresponds  to  a  well  known  difficulty  in  the  theory 
of  belief  functions--namely,  that  very  ambiguous  evidence  can  lead  to 


91 


^rtainty  and  the  Conditioning  of  Beliefs 


'unambiguous  belief  functions,  in  which  Bel(X)  =  P1(X)  (see  Lotfi 
,  1979). 

?  I  refer  to  this  as  a  difficulty,  but  of  course  whether  it  is  or  not 
jpends  in  part  on  what  is  at  stake.  One  can  imagine  circumstances  in 
'hich  the  greater  precision  afforded  by  Dempster  conditioning  more 
bn  offsets  the  security  provided  by  conditionalization.  For  example, 
da  situation  in  which  the  agent  is  forced  to  make  book  with  all  comers, 
d  in  which  the  real  distributions  in  the  world  are  unimodal,  and  in 
prhich  the  decision  rule  has  any  of  a  number  of  plausible  forms,  it  is 
that  someone  following  Dempster  conditioning  will  probably  (!) 
'come  out  ahead  of  someone  who  follows  classical  conditioning. 

Jw.  We  can  also  raise  the  question  of  whether  or  not 
conditionalization  is  itself  rational.  There  have  been  a  number  of 
arguments  in  favor  of  confirmational  conditionalization  (Paul  Teller, 
1976;  Bas  van  Fraassen,  1984).  We  do  not  find  these  arguments 
persuasive,  and  in  fact  have  argued  against  them  in  Kyburg  (1987)  and 
[in  Bacchus  et  al.  (1989). 

But  what  are  the  plausible  forms  of  a  decision  rule?  The 
>  relation  between  a  representation  by  convex  sets  of  probability  functions 
and  Shafer’s  representation  by  belief  functions  gives  us  a  handle  on  this 
question,  but  it  is  by  no  means  settled. 


V. 


DECISION  THEORY 


One  of  the  most  attractive  features  of  classical  probability— 
and  indeed  what  the  whole  approach  of  subjectivistic  probability  is 
based  on  -is  that  it  lends  itself  to  a  very  simple  and  persuasive  decision 
rule:  Maximize  Thy  Expected  Utility.  At  the  same  time,  one  of  the 
interesting  aspects  of  any  alternative  to  a  single  classical  probability 
function  as  a  representation  of  belief  is  the  way  in  which  it  lends  itself 
to  some  form  of  decision  theory. 

The  close  relation  between  belief  functions  and  convex  sets  of 
classical  probability  functions  suggest  relations  between  the  decision 
theory  appropriate  for  sets  of  classical  probability  functions  and  the 
decision  theory  appropriate  for  belief  functions.  But  what  is  the 
decision  theory  appropriate  for  sets  of  probability  functions? 
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In  the  first  place  we  can  apply  the  classical  Bayesian  procedure. 
When  we  have  intervals  of  probability,  we  can  consider  the  maximum 
expected  utility  and  the  minimum  expected  utility  of  one  decision,  and 
the  maximum  and  minimum  expected  utility  of  another  decision.  If  the 
minimum  expected  utility  of  one  decision  exceeds  the  maximum 
expected  utility  of  another  decision,  we  have  a  clear  ordering  of  those 
two  decisions. 

More  generally  and  more  precisely,  let  us  say  one  decision 
dominates  another  when  the  minimum  expected  utility  of  the  first 
exceeds  the  maximum  expected  utility  of  the  second.  In  that  case  we 
clearly  have  nothing  to  lose  if  we  forget  about  the  second  possibility. 
So,  on  perfectly  classical  grounds,  we  can  ignore  dominated  alternatives. 

Beyond  this,  the  decision  theory  for  convex  sets  of  classical 
probability  functions  reflects  classical  decision  theoretical  problems.  It 
is  a  theory  that  should  take  account  of  indeterminacy  (as  opposed  to 
uncertainty),  but  how  to  do  this  is  an  open  question.  In  classical  terms, 
if  X  and  Y  are  outcomes  that  are  both  possible  and  the  utility  of  action 
A  exceeds  that  of  action  B  if  X  is  the  case,  but  the  opposite  holds  if  Y 
is  the  case  we  are  faced  with  an  indeterminate  situation,  unless  we  know 
the  probabilities  of  the  alternatives  that  X  is  the  case  and  that  Y  is  the 
case. 

In  such  cases  there  are  various  rules  that  one  might  apply. 
Minimax  is  one,  minimax  regret  another.  Levi  (1980)  has  explored  a 
lexical  approach  based  on  a  sequence  of  notions  of  admissibility.  There 
are  no  doubt  any  number  of  alternatives,  almost  none  of  which  have 
been  adequately  discussed.  It  is  not  my  purpose  here  to  defend  one 
particular  approach  to  decision  under  these  circumstances,  but  merely 
to  point  out  the  relevance  of  classical  decision  theory  to  the  case  in 
which  uncertainty  is  represented  by  belief  functions.  The  claim  that 
there  is  no  decision  theory  to  go  with  the  uncertainty  representation  of 
belief  functions  is  clearly  wrong.  But  there  is  no  decision  theory  for 
these  cases  on  which  all  reasonable  persons  agree. 

No  more,  of  course,  have  the  classical  issues  of  decision  in  the 
face  of  uncertainty  been  solved.  But  it  is  significant  that  for  the  classical 
problem  there  are  a  number  of  alternatives  that  are  considered  worthy 
of  serious  discussion.  Equally,  for  the  convex  probability  case,  or  the 
belief  function  case,  these  alternatives  should  receive  serious 
consideration.  It  is  hoped  that  further  consideration  will  reveal  some 
principles  that  will  enlighten  our  decision-theoretic  concerns.  In  any 
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Ent  it  is  clear  that  there  is  a  decision-theoretic  framework  that  is 
ilicable  in  the  belief  function  framework,  and  it  is  also  clear  that  its 
"plication  is  not  a  matter  whose  principles  are  entirely  settled. 
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